[BugFix] fix tcgen5mma example #1577

Rachmanino · 2025-12-31T04:42:45Z

Summary by CodeRabbit

Documentation
- Added inline comments to improve clarity in GEMM example code.
Chores
- Updated block configuration parameters in the GEMM example to optimize performance characteristics.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

github-actions · 2025-12-31T04:42:54Z

👋 Hi! Thank you for contributing to the TileLang project.

Please remember to run pre-commit run --all-files in the root directory of the project to ensure your changes are properly linted and formatted. This will help ensure your contribution passes the format check.

We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀

coderabbitai · 2025-12-31T04:44:09Z

📝 Walkthrough

Walkthrough

Block_N parameter reduced from 256 to 128 in GEMM kernel configuration. A_shared and B_shared buffer shapes adjusted accordingly. Loop logic and GEMM operations remain unchanged. Inline comments added to T.copy statements for clarity.

Changes

Cohort / File(s)	Summary
GEMM kernel parameter tuning `examples/gemm_sm100/gemm_tcgen5mma.py`	Reduced block_N from 256 to 128; block_M and block_K unchanged at 128. A_shared and B_shared shapes automatically adjusted due to transposed flags. num_stages remains 2. Added inline comments to two T.copy operations. No logic changes.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐰 A hop and a skip through matrix blocks,
From 256 down to 128 we dock,
Shared memory dancing, shapes realign,
Comments now whisper what's fine,
GEMM optimized—a tuning so neat! ✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.
Title check	❓ Inconclusive	The title is vague and generic, using '[BugFix] fix tcgen5mma example' which provides no meaningful details about what bug was fixed or what the actual change is.	Provide a more specific title that describes the actual bug fix, e.g., 'Reduce block_N from 256 to 128 in tcgen5mma example' or 'Fix tcgen5mma block size configuration for SM100'.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

✨ Finishing touches

📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

examples/gemm_sm100/gemm_tcgen5mma.py (1)

41-42: Consider clarifying the inline comments.

The comments # not trans_A and # trans_B accurately reflect the configuration values, but they could be more descriptive about what they document. Consider rephrasing to clarify whether they describe the data layout or the configuration, for example: # A is row-major (trans_A=False) and # B is column-major (trans_B=True).

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 53ea96c and aca04e6.

📒 Files selected for processing (1)

examples/gemm_sm100/gemm_tcgen5mma.py

🧰 Additional context used

🧠 Learnings (1)

📓 Common learnings

Learnt from: botbw
Repo: tile-ai/tilelang PR: 691
File: src/tl_templates/cuda/gemm_sp_sm80.h:81-85
Timestamp: 2025-09-15T10:51:06.985Z
Learning: In CUTLASS tensor operation layouts, crosswise constants should be computed using sizeof(T) (bytes), not cutlass::sizeof_bits<T>::value (bits). This is the established pattern in the official CUTLASS codebase, as seen in default_mma_core_sparse_sm80.h.

Learnt from: botbw
Repo: tile-ai/tilelang PR: 691
File: src/tl_templates/cuda/gemm_sp_sm80.h:81-85
Timestamp: 2025-09-15T10:51:06.985Z
Learning: In CUTLASS tensor operation layouts, crosswise constants should be computed using sizeof(T) (bytes), not cutlass::sizeof_bits<T>::value (bits). However, the layout template parameter should use sizeof_bits<T>::value (bits). This is the established pattern in the official CUTLASS codebase, as seen in default_mma_core_sparse_sm80.h where Crosswise uses sizeof(ElementA) but the layout template uses sizeof_bits<ElementA>::value.

🔇 Additional comments (2)

examples/gemm_sm100/gemm_tcgen5mma.py (2)

55-55: LGTM! The block size reduction addresses the bug.

Reducing block_N from 256 to 128 significantly reduces shared memory usage per block (from ~160KB to ~96KB), which likely addresses hardware constraints or memory limitations specific to TensorCore Gen5 MMA operations. The dependent buffer shapes (A_shared, B_shared, C_shared) are properly adjusted by the existing logic.

58-58: No action needed. The transition from num_stages = 0 to num_stages = 2 is intentional behavior by design. The conditional logic explicitly enables 2-stage pipelining when all block dimensions are smaller than 256, and disables it when any dimension is 256 or larger. With the new tile configuration (128×128×128), this results in the correct num_stages = 2 setting. The code includes validation tests that confirm correct compilation and numerical accuracy.

fix tcgen5mma example

aca04e6

coderabbitai bot reviewed Dec 31, 2025

View reviewed changes

LeiWang1999 approved these changes Jan 1, 2026

View reviewed changes

LeiWang1999 merged commit d6eb5d3 into tile-ai:main Jan 1, 2026
9 of 11 checks passed

Rachmanino deleted the tcgen5 branch January 1, 2026 08:40

kurisu6912 mentioned this pull request Feb 11, 2026

[LoopVectorize] Loop Independent Var Optimization in IfThenElse Expr kurisu6912/tilelang#2

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BugFix] fix tcgen5mma example #1577

[BugFix] fix tcgen5mma example #1577

Uh oh!

Rachmanino commented Dec 31, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

github-actions bot commented Dec 31, 2025

Uh oh!

coderabbitai bot commented Dec 31, 2025 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[BugFix] fix tcgen5mma example #1577

[BugFix] fix tcgen5mma example #1577

Uh oh!

Conversation

Rachmanino commented Dec 31, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

github-actions bot commented Dec 31, 2025

Uh oh!

coderabbitai bot commented Dec 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Rachmanino commented Dec 31, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 31, 2025 •

edited

Loading